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Abstract 

Inspired by the biological entities' ability to achieve reciprocity in the course of evolution, this paper 
considers a conjecture-based distributed learning approach that enables autonomous nodes to indepen- 
dently optimize their transmission probabilities in random access networks. We model the interaction 
among multiple self-interested nodes as a game. It is well-known that the Nash equilibria in this game 
result in zero throughput for all the nodes if they take myopic best-response, thereby leading to a 
network collapse. This paper enables nodes to behave as intelligent entities which can proactively gather 
information, form internal conjectures on how their competitors would react to their actions, and update 
their beliefs according to their local observations. In this way, nodes are capable to autonomously "learn" 
the behavior of their competitors, optimize their own actions, and eventually cultivate reciprocity in the 
random access network. To characterize the steady-state outcome of this "evolution", the conjectural 
equilibrium is introduced. Inspired by the biological phenomena of "derivative action" and "gradient 
dynamics", two distributed conjecture-based action update mechanisms are proposed to stabilize the 
random access network. The sufficient conditions that guarantee the proposed conjecture-based learning 
algorithms to converge are derived. Moreover, it is analytically shown that all the achievable operating 
points in the throughput region are essentially stable conjectural equilibria corresponding to different 
conjectures. We also investigate how the conjectural equilibrium can be selected in heterogeneous 
networks and how the proposed methods can be extended to ad-hoc networks. Numerical simulations 
verify that the system performance significantly outperforms existing protocols, such as IEEE 802.11 
Distributed Coordination Function (DCF) protocol and priority-based fair medium access control (P- 
MAC) protocol, in terms of throughput, fairness, convergence, and stability. 
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I. Introduction 

Multi-user communication systems represent competitive environments, where networked devices com- 
pete for the limited available resources and wireless spectrum. Most of these devices are autonomous, and 
must adapt to the surrounding environment in a totally distributed and unsupervised manner. Recently, a 
number of emerging approaches have been considered to better understand, analyze, and characterize the 
dynamics of multi-user interactions among communication devices using biologically-inspired methods 
[1]- [3]. The scientific rationale for this is that, as communication networks expand in size, the network 
"entities" grow in their diversity and ability to gather and process information, and hence, networks will 
increasingly come to resemble the models of interaction and self-organization of biological systems. 

It is well-known that many biological species exhibit various levels of learning abilities, which enables 
them to survive and evolve in the process of natural selection [4]- [6]. In particular, game theory has been 
used for a long time as a descriptive tool for characterizing the interaction of biological agents learning 
to improve their utility, e.g. the chance of "survival" for the selfish genes or organisms [4] [5]. Various 
learning models have been developed, largely in response to observations by biologists about animal and 
human behavior [4] [6]. Several particular dynamic adjustment processes have received specific attention 
in the theory of learning and evolution. For example, replicator dynamics models how the share of 
the population using a certain survival strategy grows at a rate proportional to that strategy's current 
payoff. In the partial best response dynamics, a fixed portion of the population switches during each time 
period from their current action to a best response to the aggregate statistic of the population play in 
the previous period. As a result, the cooperative or altruistic behavior may be favored and reciprocity is 
therefore established in the course of evolution [7]. 

Several learning models have been applied to solve multi-user interaction problems in both wireline 
and wireless network settings [1] [8]- [10]. For instance, appropriate learning solutions are studied in 
distributed environments consisting of agents with very limited information about their opponents, such as 
the Internet [8]. A class of no-regret learning algorithms is proposed in the stochastic game framework to 
enable cognitive radio devices to learn from the environment and efficiently utilize the spectrum resource 
[1]. A reinforcement learning algorithm is proposed in the repeated game setting to design power control 
in wireless ad-hoc networks [9], where it is shown that the learning dynamics eventually converge to 
Nash equilibrium (NE) and achieve satisfactory performance. A novel learning approach is proposed for 
wireless users to dynamically and efficiently share spectrum resources by considering the time-varying 
properties of their traffic and channel conditions [10]. 
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This paper is concerned with developing distributed learning mechanisms in random access communi- 
cation networks from not only the biological, but also the game-theoretic perspective. It is well-known that 
myopic selfish behavior is detrimental in random access communication networks [14]. To avoid a network 
collapse and encourage cooperation, we adopt the conjecture-based model introduced by Wellman and 
others [18] [19] and enable the cognitive communication devices to build belief models about how their 
competitors' reactions vary in response to their own action changes. The belief functions of the wireless 
devices are inspired by the evolutionary biological concept of reciprocity, which refers to interaction 
mechanisms in which the emergence of cooperative behavior is favored by the probability of future mutual 
interactions [5] [7]. Specifically, by deploying such a behavior model, devices will no longer adopt myopic, 
selfish, behaviors, but rather they will form beliefs about how their actions will influence the responses 
of their competitors and, based on these beliefs, they will try to maximize their own welfare. The steady 
state of such a play among belief-forming devices can be characterized as a conjectural equilibria (CE). 
At the equilibrium, devices compensate for their lack of information by forming an internal representation 
of the opponents' behavior and preferences, and using these "conjectured responses" in their personal 
optimization program [19]. More importantly, we show that the reciprocity among these self-interested 
devices can be sustained. 

In particular, the main contributions of this paper are as follows. First, to cultivate cooperation in random 
access networks, we enable self-interested autonomous nodes to form independent linear beliefs about 
how their rival actions vary as a function of their own actions. Inspired by two biological phenomena, 
namely "derivative action" in biological motor control system [20] [21] and "gradient dynamics" in 
biological mutation [22]- [24], we design two simple distributed learning algorithms in which all the 
nodes' beliefs and actions will be revised by observing the outcomes of past mutual interaction over 
time. Both conjecture-based algorithms require little information exchange among different nodes and 
the internal computation for each node is very simple. For both algorithms, we investigate the stability of 
different operating points and derive sufficient conditions that guarantee their global convergence, thereby 
establishing the connection between the dynamic belief update procedures and the steady-state CE. We 
prove that all the operating points in the throughput region are stable CE and reciprocity can be eventually 
sustained via the proposed bio-inspired evolution. We also provide an engineering interpretation of the 
proposed bio-inspired design to clarify the similarities and differences between the proposed algorithms 
and existing protocols, e.g. the IEEE 802.11 DCF 

Second, we investigate the relationship between the parameter initialization of beliefs and Pareto- 
efficiency of the achieved CE. In the economic market context, it has been shown that adjustment processes 
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based on conjectures and individual optimization may sometimes be driven to Pareto-optimality [25]. To 
the best of our knowledge, this is the first attempt in investigating the Pareto efficiency of the conjecture- 
based approach in communication networks. Importantly, it is shown that, regardless of the number of 
nodes, there always exist certain belief configurations such that the proposed distributed bio-inspired 
learning algorithms can operate arbitrarily close to the Pareto boundary of the throughput region while 
approximately maintaining the weighted fairness across the entire network. Our investigation provides 
useful insights that help to define convergent dynamic adaptation schemes that are apt to drive distributed 
random access networks towards efficient, stable, and fair configurations. 

The rest of this paper is organized as follows. Section II presents the system model of random access 
networks, reviews the existing game theoretic solutions, and introduces the concept of CE. Based on 
the intuition gained from "derivative action" and "gradient dynamics", Section III develops two simple 
distributed learning algorithms in which nodes form dynamic conjectures and optimize their actions 
based on their conjectures. The stability of different CE and the condition of global convergence are 
established. This section also shows that nodes' conjectures can be configured to stably operate at any 
point that is arbitrarily close to the Pareto frontier in throughput region. Section IV addresses the topics 
of equilibrium selection in heterogeneous networks and presents possible extension to ad-hoc networks. 
Numerical simulations are provided in Section V to compare the proposed algorithms with the IEEE 
802. 1 1 DCF protocol and P-MAC protocol. Conclusions are drawn in Section VI. 

II. System Description and Conjectural Equilibrium 

In this section, we describe the system model of random access networks and define the investigated 
random access game. We also discuss the existing game-theoretic solutions and introduce the concept of 
conjectural equilibrium. 

A. System Model of Random Access Networks 

Following [12] [13], we model the interaction among multiple autonomous wireless nodes in random 
access networks as a random access game. 

As shown in Fig. [TJ consider a set K, = {1, 2, ... , K} of wireless nodes and each node represents a 
transmitter-receiver pair (link). We define Tx^ as the transmitter node of link k and Rx^ as the receiver 
node of link k. We first assume a single-cell wireless network, where every node can hear every other 
node in the network, and we will address the ad-hoc network scenario in Section IVB. The system 
operates in discrete time with evenly spaced time slots [30] [33]. We assume that all nodes always have 



a data packet to transmit at each time slot (i.e. we investigate the saturated traffic scenario^), and the 
network is noise free and packet loss occurs only due to collision. The action of a node in this game is 
to select its transmission probability and a node k will independently attempt transmission of a packet 
with transmit probability The action set available to node k is Pk = [0, 1] for all k € A^. Once the 
nodes decide their transmission probabilities based on which they transmit their packets, an action profile 
is determined. We denote the action profile in the random access game as a vector p = (pi, . . . ,Pk) in 
P = Pi X • • • X Pjc- Then the throughput of node k is given b)0 

Uk(p)=Pk Yl 0-~Pi)- W 
ie/C\{fc} 

To capture the performance tradeoff in the network, the throughput (payoff) region is defined as = 
{(«i(p), . . . ,ua'(p))| 3 p E P}. The random access game can be formally defined by the tuple T = 
(]C, (Pk), (uk)) [26]. Denote the transmission probability for all nodes but k by p_fc = (pi,... ,Pk-i,Pk+i, 
...,Pk). From (fl}, we can see that node A;'s throughput depends not only on its own transmission 
probability pk, but also the other nodes' transmission probabilities p_fc- 

B. Existing Solutions 

The throughput tradeoff and stability of random access networks have been extensively studied from the 
game theoretic perspective [11]- [17]. This subsection briefly reviews these existing results and highlights 
the advantage and disadvantage of different approaches. 

In the random access game, one of the most investigated problems is whether or not a Nash equilibrium 
exists. The definition of Nash equilibrium is given as follows [26]. 

Definition 1: A profile p of actions constitutes a Nash equilibrium of T if Uk(pk, P-fc) > u k(p'k^ P-fc) 
for all p' k G P k and k € K. 

The NE of the investigated random access game has been addressed in the similar context of CSMA/CA 
networks where selfish nodes deliberately control their random deferment by altering their contention 

'This paper focuses on the saturated system because we are interested in throughput maximization. The analysis can be 
extended to investigate the non-saturated networks where the incoming packets of the individual nodes' queues arrive at finite 
rates. 

2 The action set can be alternatively defined to be Pk = [-P™ 111 , -P™ ax ] and the analysis in this paper still applies. 

3 This throughput model assumes that time is slotted and all packets are of equal length. We use this model for theoretic 
analysis. The throughput of the scenarios in which packet lengths are not equal, e.g. the IEEE 802.11 DCF, will be addressed 
in Section V. 
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windows [14]. Specifically, the transmission probability pk in our model can be related to the contention 
window CWk in the CSMA/CA protocol, where pk = i + cw k • ^ nas been shown in [14] that at the 
NE, at least one selfish node will set CWk = 1 (i-e. always transmit). If more than one selfish node sets 
its contention window to 1, it will cause zero throughput for all the nodes in the system. This kind of 
result is known as the tragedy of the commons. We can see that, myopic selfish behavior is detrimental in 
random access scenarios and novel mechanisms are required to encourage cooperative behavior among 
the self-interested devices. In addition, the existence of and convergence to the NE in random access 
games have been studied also in other scenarios, where individual nodes have utility functions that are 
different from £T|) [11] [12]. For example, the nodes in [11] adjust their transmission probabilities in 
an attempt to attain their desired throughputs. A local utility function is found for exponential backoff- 
based MAC protocols, based on which these protocols can be reverse-engineered in order to stabilize the 
network [12]. However, due to the inadequate coordination or feedback mechanism in these protocols, 
Pareto optimality of the throughput performance cannot be guaranteed. 

Several recent works also investigate how to design new distributed algorithms that provably converge to 
the Pareto boundary of the network throughput region [14]- [16]. A distributed protocol is proposed in [14] 
to guide multiple selfish nodes to a Pareto-optimal NE by including penalties into their utility functions. 
However, the penalties must be carefully chosen. In [15], the utility maximization is solved using the 
dual decomposition technique by enabling nodes to cooperatively exchange coordination information 
among each other. Furthermore, it is shown in [16] that network utility maximization in random access 
networks can be achieved without real-time message passing among nodes. The key idea is to estimate 
the other nodes' transmission probabilities from local observations, which in fact increases the internal 
computational overhead of individual nodes. 

As discussed before, the goal of this paper is to design a simple distributed random access algorithm 
that requires limited information exchanges among nodes and also stabilizes the entire network. More 
importantly, this algorithm should be capable of achieving high efficiency and of differentiating among 
heterogeneous nodes carrying various traffic classes with different quality of service requirements. As we 
will show later, the game-theoretic concept of conjectural equilibrium provides such an elegant solution. 

C. Conjectural Equilibrium 

In game-theoretic analysis, conclusions about the reached equilibria are based on assumptions about 
what knowledge the players possess. For example, the standard NE strategy assumes that every player 
believes that the other players' actions will not change at NE. Therefore, it chooses to myopically 



7 



maximize its immediate payoff [26]. Therefore, the players operating at equilibrium can be viewed as 
decision makers behaving optimally with respect to their beliefs about the strategies of other players. 

To rigorously define CE, we need to include two new elements S and s and, based on this, reformulate 
the random access game V = (jC, (Pk), (uk), (Sk), [18]. S = x^icSk is the state space, where Sk 
is the part of the state relevant to the node k. Specifically, the state in the random access game is defined 
as the contention probability that nodes experience. The utility function Uk is a map from the nodes' 
state space to real numbers, Uk : Sk x Pk — > H. The state determination function s = Xk&K. s k maps 
joint action to state with each component Sk ■ P — > Sk- Each node cannot directly observe the actions 
(transmission probabilities) chosen by the others, and each node has some belief about the state that 
would result from performing its available actions. The belief function Sk is defined to be Sk : Pk — ► Sk 
such that Sk{pk) represents the state that node k believes it would result in if it selects action pk . Notice 
that the beliefs are not expressed in terms of other nodes' actions and preferences, and the multi-user 
coupling in these beliefs is captured directly by individual nodes forming conjectures of the effects of 
their own actions. Moreover, each node chooses the action pk € Pk if it believes that this action will 
maximize its utility. 

Definition 2: In the game V defined above, a configuration of belief functions (s*, . . . , s* K ) and a joint 
action p* = (p*, . . . ,p* K ) constitute a conjectural equilibrium, if for each k G JC, 

S k(Pk) = s k(p\, ■■■,P*k) and p\ = arg max Uk(s\(Pk),Pk)- 

Pk&Pk 

From the above definition, we can see that, at CE, all nodes' expectations based on their beliefs are 
realized and each node behaves optimally according to its expectation. In other words, nodes' beliefs are 
consistent with the outcome of the play and they behave optimally with respect to their beliefs. The key 
challenges are how to configure the belief functions such that reciprocal behavior is encouraged and how 
to design the evolution rules such that the network can dynamically converge to a CE having satisfactory 
performance. Section III provides bio-inspired solutions for these problems in random access games. 

III. Distributed Bio-inspired Learning 

In this section, to promote reciprocity, we design a prescribed rule for each node to configure its 
belief about its expected contention of the wireless network as a linear function of its own transmission 
probability. It is shown that all the achievable operating points in the throughput region 27 are CE by 
deploying these belief functions. Furthermore, inspired by the biological mechanisms "derivative action" 
and "gradient dynamics", we propose two distributed learning algorithms for these nodes to dynamically 
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achieve the CE. We provide the sufficient conditions that guarantee the stability and convergence of 
the CE. We also discuss the similarities and differences between these bio-inspired algorithms and the 
existing well-known protocols. Finally, it is proven that any Pareto-inefficient operating point is a stable 
CE, i.e. we can approach arbitrarily close to the Pareto frontier of the throughput region & . 

A. Individual Behavior 

As discussed before, both the state space and belief functions need to be defined in order to investigate 
the existence of CE. In the random access game, we define the state Sk = riieK;\{fc}(l ~ Pi) to ^ e tne 
contention measure signal representing the probability that all nodes except node k do not transmit. This 
is because besides its own transmission probability, its throughput only depends on the probability that 
the remaining nodes do not transmit. We can see that state Sk indicates the aggregate effects of the other 
nodes' joint actions on node fc's payoff. In practice, it is hard for wireless nodes to compute the exact 
transmission probabilities of their opponents [16]. Therefore, we assume that Sk is the only information 
that node k has about the contention level of the entire network, because it is a metric that node k can 
easily compute based on local observations. Specifically, from user fc's viewpoint, the probabilities of 
experiencing an idle time slot is p tdle = (1 — pk)sk- Let n l k dle denote the number of time slots between 
any two consecutive idle time slots. n tdle has an independent identically distributed geometric distribution 
with probability p\ dle . Therefore, we have p tdle = 1/(1 + n k dle ), where n ldle is the mean value of n\ dle 
and can be locally estimated by node k through its observation of the channel contention history. Since 
node k knows its own transmission probability p k , it can estimate Sk using = 1/(1 + n tdle ){l — pk). 
Notice that the action available to node k is to choose the transmission probability pk G Pk- By the 
definition of belief function, we need to express the expected contention measure Sk as a function of its 
own transmission probability pk- The simplest approach is to deploy linear belief models, i.e. node fc's 
belief function takes the form 

Sk(Pk) = Sk - a k (pk ~ Pk), (2) 

for k G K. The values of Sk and pk are specific states and actions, called reference points [25] and 
is a positive scalar. In other words, node k assumes that other nodes will observe its deviation from 
its reference point pk and the aggregate contention probability deviates from the referent point Sk by 
a quantity proportional to the deviation of p k — pk- How to configure s~k,Pk, and a k will be addressed 
in the rest of this paper. The reasons why we focus on the linear beliefs represented in (f2]) are two- 
fold. First, the linear form represents the simplest model based on which a user can model the impact 
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of its environment. As we will show later in Section III-E, building and optimizing over such simple 
beliefs is sufficient for the network to achieve almost any operating point in the throughput region as a 
stable CE. Second, the conjecture functions deployed by the wireless users are based on the concept of 
reciprocity [5] [7], which was developed in evolutionary biology, and refers to interaction mechanisms in 
which the evolution of cooperative behavior is favored by the probability of future mutual interactions. 
Similarly, in single-hop wireless networks, the devices repeatedly interact when accessing the channel. If 
they disregard the fact that they have a high probability to interact in the future, they will act myopically, 
which will lead to a tragedy of commons (the zero-payoff Nash equilibrium). However, if they recognize 
that their probability of interacting in the future is high, they will consider their impact on the network 
state, which is captured in the belief function by the positive a^. 

The goal of node k is to maximize its expected throughput Pk-Sk(pk) taking into account the conjectures 
that it has made about the other nodes. Therefore, the optimization a node needs to solve becomes: 



where the second term is the expected contention measure Sk(pk) if node k transmits with probability pk- 
The product of pk and §k(pk) gives the expected throughput for pk £ Pk- For a& > 0, node k believes that 
increasing its transmission probability will increase its experienced contention probability. The optimal 
solution of © is given by 



In the following, we first show that forming simple linear beliefs in © can cause all the operating 
points in the achievable throughput region to be CE. 

Theorem 1: All the operating points in the throughput region 2f are conjectural equilibria. 

Proof: For each operating point (n, . . . , tk) in the throughput region S? , there exists at least a joint 
action profile (p*, . . . ,p* K ) G P such that Tk = itfc(p*), Vfc € K. We consider setting the parameters in 
the belief functions to be: 



Sk(pl, • • • )P*k) an d Pk = ar S max Pfce-Pfc Uk(sk{Pk),Pk)- Therefore, this configuration of the belief func- 
tions and the joint action p* = (p* , . . . , p* K ) constitute the CE that results in the throughput (t\, . . . , tk)W 



max pk s k - a k (pk - Pk) 

Pk&Pk L 



(3) 




(4) 




(5) 



4 By the definition of CE, the configuration of the linear belief functions is a key part of CE. Since this paper focuses on the 
linear belief functions defined in ((2), we will simply state the joint action p* is a CE hereafter for the ease of presentation. 
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Theorem Q] establishes the existence of CE, i.e. for a particular p* £ P, how to choose the parameters 
{skiPk, a k\k=i sucn ^at p* is a CE. However, it neither tells us how these CE can be achieved and 
sustained in the dynamic setting nor clarifies how different belief configurations can result in various CE. 

In distributed learning scenarios, nodes learn when they modify their conjectures based on their new 
observations. Specifically, we first allow the nodes to revise their reference points based on their past 
local observations. Let s\, f p t k ,s t k ,s t k ,p t k be user fc's state, transmission probability, belief function, and 



reference points at stage t 



in which s| = riieK;\{fc}(l ~ Pi)- We P ro P°se a simple rule for individual 
nodes to update their reference points. At stage t, node k set its s l k and p\ to be si - and pt~ . In other 
words, node fc's conjectured utility function at stage t is 



=Pk[ n ( x - p* x ) - - pi x ) 

i£K\{k} 



(6) 



The remainder of this paper will investigate the dynamic properties of the resulting operating points and 
the performance trade-off among multiple competing nodes. In particular, for fixed {ak} k=v Sections 
III-B and C will embed the above individual optimization scheme in two different distributed learning 
processes in which all the nodes update their transmission probabilities over time. Section III-E further 
allows individual nodes adaptively update their parameters {ak\k=i sucn tnat desired efficiency can be 
attained. For given {ak\ k= i, Section IV-A will derive a quantitative description of the resulting CE p*. 

B. A Best Response Learning Algorithm 

Our first algorithm in establishing reciprocity through a evolution process is inspired by the "derivative 
action", which is a key component of biological motor control system models, e.g. cerebellar control over 
arm, hand, truncal, and leg movements [20] [21]. Specifically, during limb movements, high frequency 
differential (velocity-like) signals after filtering due to biological sensors are attributed to lateral cerebel- 
lum as part of the input for cerebellar control. The classical control inteipretation of "derivative action" is 
that the first-order derivative term serves as a short term prediction of the measured zero-order variable. 
For example, in a swing leg control, the velocity-like signals enable a cerebrocerebellar channel to better 

5 This paper assumes the persistence mechanism for contention resolution except in Section III.D. In the persistence mechanism, 
each wireless node maintains a persistence probability and accesses the channel with this probability [37]. A stage contains 
multiple time slots. The nodes estimate the contention level in the network and update their persistence probabilities in the 
"stage-by-stage" manner. The superscript t in this paper represents the numbering of the stages unless specified. 
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locate the ankle (or foot) position in front of the hip position during the swing phase. The conjecture- 
based approach is very similar in spirit with the aforementioned biological motor control models and 
the standard proportional integral derivative (PID) controllers in engineered systems [38]. In particular, 
the derivative term is substituted by a node's internal belief of how its own action will impact the other 
nodes' behavior. Our first learning algorithm adopts the simplest update mechanism in which each node 
adjusts its transmission probability using the best response that maximizes its conjectured utility function 
(©. Therefore, at stage t, node k chooses a transmission probability 

pi = arg max u s fc .ft = mm \ + ^ , 1 \. (7) 

PfcGP fc L 2 2ak > 

In this regard, the use of "derivative action" by an agent can be interpreted as using the best response to 
the forecasted effect of all the opponents' strategies. 

Algorithm 1 : A Distributed Best Response Learning Algorithm for Random Access 
1: Initialize: t = 0, the transmission probability p° k G [0, 1], and the parameter > in node /c's belief 

function, V/e G /C. 

2: procedure 

3: Locally at each node k, iterate through t: 
4: Set t <- t + 1. 
5: for all k G /C do 

6: At stage t, p\ <- min{p*- 1 /2 + UieK\{k}( 1 ~ P^ 1 )/^), 1}- 

7: end for 

8: Node k decides if it will transmit data with a probability pt, (or equivalently, maintain a window 

size of CW\ = 2/p\, — 1) for all the time slots during stage t. 
9: end procedure 



The detailed description of the entire distributed best response learning procedure is summarized in 
Algorithm 1 and it is also pictorially illustrated in Fig. [2] Next, we are interested in deriving the limiting 
behavior, e.g. stability and convergence, of this algorithm. For ease of illustration, the sufficient conditions 
for stability and convergence throughout this paper are expressed in terms of {pk\k=i an( ^ i a k}k=v 
respectively. The mapping from {pk}k=i to { a k\k=i * s gi ven in © and the mapping from {ak} k=1 to 
{PkYk=i Wli l be addressed in Section IV-A. 

1 ) Local Stability: Although Theorem Q] indicates that all the points in 3f are CE, they may not be 
necessarily stable. An unstable equilibrium is not desirable, because any small perturbation might cause 
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the sequence of iterates to move away from the initial equilibrium. The following theorem describes a 
subset in P in which all the points are stable CE. 
Theorem 2: For any p* = (p*, . . . ,p* K ) G P, if 

5>£<1, or Y, T ^<hVk£)C, (8) 
fc=i ieic\{k} Pi 

p* is a stable CE for Algorithm 1. 

Proof: To analyze the stability of different CE, we consider the Jacobian matrix of the self-mapping 
function in |7]). Let denote the element at row i and column k of the Jacobian matrix J. If p'jT 1 /2 + 
riieJC\{fc}(l ~ Pi~ 1 )/(^ a k) < 1. the Jacobian matrix J BR of © is denned as: 



T BR_ dp 

•J, 



ik q t— 1 



77 « if % — k . 

(9) 



As proven in Theorem [2 for p* = (p*, . . . G P to be a fixed point of the self-mapping function in 
©, a fc must be set to be a£ = IlieAW^ 1 ~ P*i)/P*k- 11 follo ws that 



tBRi 

J ik lp=P*, «=«* 



2 ; if i — k 5 
-JS— if » / jfe. 



(10) 

"2(l-p*)' 

p* is stable if and only if the eigenvalues {Xk}k=i °f mat rix J BR in ( fTOl ) are all inside the unit circle of 
the complex plane, i.e. |Afc| < 1, Vfc G /C. 

From Gersgorin circle theorem [27], all the eigenvalues {Aa ; }^L 1 of J BR are located in the region 

U{|A E 1^1} and [J {|A- J^K E 1^*1 }■ 
fe=l «e/c\{fc} fc=i ieK.\{k} 

Note that = 1/2, these regions can be further simplified as 

If either condition in ([8]) is satisfied, all the eigenvalues of J BR must fall into the region |A — \\ < |, 
which is located within the unit circle |A| < 1. Therefore, p* is a stable CE. ■ 

Remark 1: pt/ (1— p*) can be interpreted as the worst case probability that node k occupies the channel 
given that node i does not transmit. This metric reflects from node fc's perspective the impact that node i's 
evacuation has on the overall congestion of the channel. Therefore, the sufficient conditions in © means 
that if the system is not overcrowded from all the nodes' perspectives, the corresponding CE is stable. We 
can see from Theorem |2] that lowering the transmission probabilities helps to stabilize the random access 



13 



network. The system can accommodate a certain degree of individual nodes' "aggressiveness" while 
maintaining the network stability. For example, if a node sends its packets with a probability close to 1, 
as long as the other nodes are conservative and they set their transmission probability small enough, the 
entire network can still be stabilized. However, if too many "aggressive" nodes with large transmission 
probabilities coexist, the system stability may collapse, leading to a tragedy of commons. 

2 ) Global Convergence: Note that Theorem [2] only investigates the stability for different fixed points, 
i.e. Algorithm 1 converges to these points when initial values are close enough to them. In addition to 
local stability, we are also interested in characterizing the global convergence of Algorithm 1 when using 
various ap. to initialize the belief function su- 

Theorem 3: Regardless of any initial value chosen for {p^}^ =1 , if the parameters {ak}^ =1 in the belief 
functions {sk}f =i satisfy 

V -<l,\/k€K, (11) 
ieK.\{k} 

Algorithm 1 converges to a unique CE. 

Proof: For > 1, the self-mapping function in © can be rewritten as 

t _ vi 1 ru/cW 1 -^ 1 ) 

We can prove for Algorithm 1 the uniqueness of and the convergence to CE by showing that function 
(TT21 is a contraction map if the condition in (ITTb is satisfied. 

Let d{-) be the induced distance function by certain vector norm in the Euclidean space. Consider two 
sequences of the transmission probability vectors {p°, . . . , p* _1 ,p*, . . .} and {p°, . . . ,p* _1 ,p*, . . .}. We 
have 

d(p*,p*) = ||p'-p<|| < lU^H . ||p*-i _ p*-i|| = lU^n . d(p *-i,p*-l). (13) 

The matrix norm used here is induced by the same vector norm. Using ||-||i for the Jacobian matrix of 
(fT2l ) as given in (fTOl ), we have 

- 11 1 

U BR h =maxy|jF ? j < - + -max V — . (14) 

i=l i£K\{k} 

Therefore, if the condition in (fTTb is satisfied, there exist a constant q € [0, 1) and a positive e, such that 
q = ||J BR ||i = 1 — e < 1 and ||p* — p*||i < g||p <_1 — p ||i. From the contraction mapping theorem 
[28], the self-mapping function in (0 has a unique fixed point and the sequence {p*}^ 1 converges to 
the unique fixed point. ■ 
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Remark 2: We can also alternatively derive a sufficient condition using || • ||oo for (fT3l to be a contraction 
map. We have 



Therefore, if ak > K — 1, Vfc € K, Algorithm 1 also globally converges. However, it is easy to verify 
that it is a special case of the sufficient condition given by (ITTb . In addition, we can see from (ITTb that, 
if the accumulated "aggressiveness" of the nodes in the entire networks reaches a certain threshold, the 
global convergence property may not hold. However, if all the nodes back off adequately by choosing 
their algorithm parameters {ak}^ =1 such that condition (ITTb is satisfied, Algorithm 1 globally converges. 

Remark 3: Under the sufficient condition in ([TT1 ). by substituting © into (ITTb . the limiting points lie 
in the set 



It is easy to check that this is a subset of {p* = (p\,... ,p* K )\ J2k=iPk < 1} f° r K > 2, which verifies 
the intuition that the set that Algorithm 1 globally converges to should be a subset of the set of locally 
stable CE. 

C. A Gradient Play Learning Algorithm 

The best-response based dynamics may lead to large fluctuations in the entire network, which may 
not be desirable if we want to avoid temporary system-wide instability. Therefore, in this subsection, we 
propose an alternative learning algorithm inspired by the gradient type dynamics, which has been well 
studied in the field of evolutionary biology [22] [23]. For example, in population genetics, the evolutionary 
dynamics resulting for a particular mutant's invasion fitness, i.e. its growth rate, are primarily governed 
by the fitness gradient. In other words, the population has a small probability of moving its phenotype [4] 
in the direction in which fitness is increasing, and this probability is proportional to the fitness gradient 
for possible mutants. This model has also been used to model fluid flow under a pressure gradient or the 
motion of organisms towards sites of higher nutrient concentration [24]. 

Motivated by the gradient dynamics, we consider the gradient play learning algorithm. At each iteration, 
each node updates its action gradually in the ascent direction of its conjectured utility function in ©. 
Specifically, at stage t, node k chooses its transmission probability according to 




(15) 




(16) 




1 1 



pi = pi 1 + ik 



dp k 



(17) 



pt=p k 



t-iJ o 
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in which [x\ b a means max{min{x, b}, a}. The engineering interpretation of this updating procedure is 
that each node will "mutate", i.e. update its transmission probability, along the gradient direction of 
its conjectured utility function. As long as the stepsize 7^ is small enough, the entire network will 
"evolve" smoothly and temporary system-wide instability will not occur. This algorithm also resembles 
the technique of exponential smoothing in statistics [39]. In the following, we assume that all nodes use 
the same stepsize 7^ = 7,V£; G K, and < p| < 1. If 7 is sufficiently small, substituting the utility 
function © into (fTTT ). we have 

pi=pt i +i{ n (i-p-- 1 )-^ 1 }- as) 

ieic\{k} 

The detailed description of the distributed gradient play learning mechanism is summarized in Algorithm 
2. As for Algorithm 1 , we investigate the stability and convergence of this gradient play learning algorithm. 



Algorithm 2 : A Distributed Gradient Play Learning Algorithm for Random Access 
1: Initialize: t = 0, stepsize 7, the transmission probability p® € [0,1], and the parameter > in 

node /c's belief function, VA; G /C. 

2: procedure 

3: Locally at each node k, iterate through t: 
4: Set t <- t + 1. 
5: for all k E K do 
6: At stage t, p\ 

7: end for 

8: Node k decides if it will transmit data with a probability p\ (or equivalently, maintain a window 

size of CWj: = 2/p\, — 1) for all the time slots during stage t. 
9: end procedure 



1) Local Stability: First of all, the following theorem describes a stable CE set in P for Algorithm 2. 
Theorem 4: For any p* = (p*, . . . ,p* K ) G P, if 

K * 

2>j£<l, or T ^T<1, VfcG/C, (19) 

k=l ieK.\{k} Pi 

and the stepsize 7 is sufficiently small, p* is a stable CE for Algorithm 2. 
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Proof: Consider the Jacobian matrix J GP of the self-mapping function in (fT8T ). We have J G k p = 
dp\/dp t T . As discussed above, for p* = (p|, . . . ,p* K ) £ P to be a fixed point of the self-mapping 
function in ( fT8T ), must be set to be a* k = F[ieK:\{fc}(l ~~ Pi)/Pk- ^ follows that 

-yUieK\{i,k}0--Pi)> Hi^k. 
p* is stable if and only if the eigenvalues {\k}k=i of matrix J GP are all inside the unit circle of the 
complex plane, i.e. |Ajt| < 1, Wc € /C. Recall that the spectral radius p(J) of a matrix J is the maximal 



jGP\ 

u ik lp=P , a=a* 



(20) 



absolute value of the eigenvalues [27]. Therefore, it is equivalent to prove that p(J 



GP\ 



< 1. 



To a vector iu = (wi, • • • , itfjf) € 7£;r with positive entries, we associate a weighted norm, defined 



as 



The vector norm 



Fife I 

induces a matrix norm, defined by 

1 K 

= max — Icifcilwi. 
i=i 



According to Proposition A.20 in [31], p(J GP ) < \\J GP \\^o- Consider the vector w = (t^i, 
which Wk = p k (l — p* k ). We have 



K 



i=i 



1 7 ll;gK:\{fc}( 1 ~ Pi) r y- p| 
p£ L 4-^ . 1 - Pt 



iG/C\{fc} 



Pit 



Therefore, if ^fcefc^fc < 1j VA; € /C, there exists some /3 > such that 
I\ieK\{k}( 1 ~P* 



Pi 



1 



E i 

i&C\{k} 



Pi * 
~Pk 



>/3, Vk e K. 



If the stepsize 7 satisfies < 7 < we have 

7lli e c\{fe}( 1 -Pi 



iGP\\V) 



max<; 1 



Pi 



E T 

ieK\{fc} 



< 1 - 7/3 < 1. 



(21) 



(22) 



(23) 



(24) 



(25) 



Since p(J 



< IIJ 



GPi 



< 1, all the eigenvalues of J GP must fall into the unit circle |A| < 1. Therefore, 



p* is a stable CE. Similarly, by choosing w = [1, ■ ■ ■ , 1], we can show that, if 

Pk 



Y — ^- < 1, VA; e /C, 

i€K.\{k} F% 



(26) 



and 7 is sufficiently small, p* is also stable. 
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Jg p = { ' ■ ' (28) 



2) Global Convergence: Similarly as in the previous subsection, we derive in the following theorem 
a sufficient condition under which Algorithm 2 globally converges. 

Theorem 5: Regardless of any initial value chosen for {pjJ}fLi> if the parameters {ak\k=l in the belief 
functions {sfc}£Li satisfy 

V —<l,VkeK, (27) 
^— ' a,' 

ie/c\{fc} 

and the stepsize 7 is sufficiently small, Algorithm 2 converges to a unique CE. 

Proof: For the self-mapping function in (TT8T ). the elements of its Jacobian matrix J GP satisfy 

1 — 7«fc, if i = k 

-7lliGJC\{i,fc}( 1 -«)» if*^fc- 
Consider the induced distance by weighted norm in the Euclidean space. We have 

l|p*-p t HS ) <l|J GP ||* •||p t - 1 -p t_1 HS J - (29) 
Using to = (1/osi, • • ■ , for (|29l ), we have 

k ie£\{fc} ZG/C\{i,fc} k ieK\{k} J 

Therefore, if the condition in (|27T ) is satisfied, there exists some f3 > such that 

Ofcfl- ^ — )>/3, VKK. (31) 

iG/C\{fc} % 

If the stepsize 7 satisfies < 7 < l/f3, we have 

IU GP |l^< S {l-7^(l- £ ^)}<l-7/?<l. (32) 

Therefore, there exist a constant q € [0, 1) and a positive e, such that g = ||J GP ||^ = 1 — e < 1 and 
IIP* ~~ P*lloo — ?IIP* 1 — P* _1 |loo- From the contraction mapping theorem [28], the self-mapping function 
in ( fT8l ) has a unique fixed point and the sequence {p t }f^ converges to the unique fixed point. ■ 

Remark 4: Compare Theorem [4] and [5] with Theorem [2] and [3] We can see that, given the same target 
operating point p or parameters {a/ c }^L 1 , Algorithm 2 exhibits similar properties in terms of local stability 
and global convergence, provided that its stepsize 7 is sufficiently small. In other words, the limiting 
behavior of these two distinct bio-inspired dynamic mechanisms are similar. However, we need to consider 
some design trade-off for both algorithms and choose the desired learning algorithm based on the specific 
system requirements about the speed of convergence and the performance fluctuation. Generally speaking, 
the best response learning algorithm converges fast, but it may cause temporary large fluctuations during 
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the convergence process, which is not desirable for transporting constant-bit-rate applications. On the 
other hand, the gradient play learning algorithm with small stepsize will evolve smoothly at the cost of 
sacrificing its convergence rate. 

D. Alternative Interpretations of the Conjecture-based Learning Algorithms 

In this section, we re-interpret the proposed algorithms using the the backoff mechanism model in which 
the transmission probabilities change from time slot to time slot [37], which helps us to understand the 
key difference between the proposed algorithms and 802.11 DCF. The superscript t in this subsection 
represents the numbering of the time slots. We define T k and T l _ k as the events that node k transmits 
data at time slot t and any node in K,\{k} transmits data at time slot t, respectively. If > 1, the RHS 
of © equals to \- + ' £>cu ^ fc , and the best response update function in <[Vj) can be rewritten as 

Pk = 2 E {^ ll {T- 1 =i}IP t "H+^ E { 1 {T!- 1 =o} 1 {rr 1 =o}|P* _1 }+2( 1+ — M 1 ^^} 1 ^- 1 ^}^' 1 ^ 

(33) 

where l a is an indicator function of event a taking place, E{a|6} is the expected value of a given b, 
Ejl^-^!}!^ 1 } = and Ell^-^jlp*- 1 } = 1 - UieKMkji 1 ~ V^ 1 )- According to <HS>, we 

can provide an alternative interpretation of the best-response update algorithm as follows. Consider the 
following update algorithm. At each time slot, if node k observes that any other node attempts to transmit, 
i.e. it senses a busy channel, it reduces its transmission probability by a factor 1/2. If no transmission 
attempt is made by any node in the system, node k sets its transmission probability to be Xjla^. Otherwise, 
if node k makes a successful transmission, it will transmit with probability 0.5(1 + l/a^) in the next 
time slot. We can see that equation d33l characterizes the expected trajectory of this alternative update 
mechanism. Fig.[3]compares this new interpretation with the IEEE 802.1 1 DCF [12]. We can see that, node 
k behaves similarly in the best response algorithm and the IEEE 802.11 DCF if it made a transmission 
attempt in the previous time slot, and the fundamental difference between these two protocols is how 
node k updates its action given that it did not transmit in the previous time slot. In DCF, p k is kept the 
same as pt" 1 - However, as we can see from (|33l , the best response algorithm either performs back-off if 
the channel is busy or sets p\ to be l/2af c if the channel is free. This can also be intuitively interpreted 
from a biological perspective: if the channel is busy, meaning that other competitors are accessing the 
resource (transmission opportunity), the node can avoid a confrontation by becoming less aggressive (i.e. 
reducing its transmission probability); if on the other hand, the system is idle and the resource is wasted, 
the node will consume the resource by increasing its transmission probability to l/2afc. 
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Remark 5: Both Equation © and (l33l intuitively explain the meaning of the algorithm parameters 
{ak}k = i- Note that the numerator of ©, riiG;e\{fc}(l ~~ Pi)' represents the probability that transmitter 
k experiences a contention-free environment at p*. The value of \/a k , i.e. the ratio between node fc's 
transmission probability p k and its contention-free probability, indicates the "aggressiveness" of this 
particular node at equilibrium. In addition, according to d33l) , the transmission probability \j1ay. also 
reflects node fc's "aggressiveness" in selecting its transmission probability after it sensed a free channel. 
It is straightforward to see the selection of {ak\ k =i introduces some trade-off between the stability 
and throughput of the networks. First of all, large values of {ak\k=i refrain nodes from transmitting 
at a higher channel access probability, and hence, it stabilizes the system at the cost of reducing the 
throughput. On the other hand, lowering {ak\k = \ increases the nodes' transmission probability, which 
may improve the throughput performance. However, it can cause the conditions in ([8]) and ( fT9l to fail and 
the system becomes unstable. Therefore, the problems which we will investigate in the next subsection 
are which part of the throughput region can be achieved with stable CE and how the nodes can adaptively 
update their {ak}k=i suc h that the system can attain efficient and stable operating points. 

Before proceeding to the next subsection, similarly as for the best response algorithm, we present an 
reinterpretation of the gradient play. Equation (fT8l ) can be rewritten as 

p{ = (l- 7afc)Pr 1 E{l {2l -i =1} |p*- 1 } + [P^ 1 + 7(1 - ojfcpf 1 )]E{l {2l -x =0} [p*- 1 }. (34) 

If afe > 1, the interpretation of (1341 is that at each time slot, if node k senses a busy channel, it reduces 
its transmission probability by a factor 1 — 70*., otherwise it increases its transmission probability by an 
amount 7(1 — a/cpjr 1 ). We can see that, this interpretation of the gradient play learning resembles the 
well-known AIMD (Additive Increase Multiplicative Decrease) control algorithm, which has been widely 
applied in the context of congestion avoidance in computer networks due to its superior performance in 
terms of convergence and efficiency [29]. 

E. Stability of the Throughput Region 

The results in the previous subsections describe the values of {pk}k=i an( ^ i a k}k=i f° r which local 
stability and global convergence can be guaranteed in both Algorithm 1 and 2. This subsection directly 
investigates for both algorithms the stability of achievable operating points in the throughput region £? . 

Lemma 6: The Pareto boundary of the throughput region & is the set of all points r = (t\, . . . , tk) 
such that r k = p k Y[ i&K \ {k }{~t-p-i) where p = (p h . . . ,p K ) is a vector satisfying p ^ and Y.keK,Pk = ^ 
and each such r is determined by a unique such p. 
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Proof: See Theorem 1 in [30]. ■ 

Theorem 7: Regardless of the number of nodes in the network, for any Pareto-inefficient operating point 
r* in the throughput region 3F, there always exists a belief configuration {a k } k=1 stabilizing Algorithm 
1 and 2, and achieve the throughput r*. If K > 2, any Pareto-optimal operating point {p k } k=1 in S? 
that satisfies p k > 0, \/k G K is a stable CE for Algorithm 1 and 2. 

Proof: From Theorem [2 we know that J2keK.Pk < 1 is sufficient to guarantee that the corresponding 
CE is stable. Therefore, it is equivalent to check that any Pareto-inefficient operating point r* can be 
achieved with a joint transmission probability p* G P satisfying YlkeicPk < 1- 

Define the throughput region 



in which an additional constraint ^CfcefcPfc < * is imposed. We denote the Pareto boundary of ^(t) as 



Following the proof of Lemma [6l we can draw a similar conclusion: all the points on d£?(t) satisfy 
Ylk&K.Pk = t- By Lemma[6l dS?{\) corresponds to the Pareto boundary of 2? '. Note that 9=^(0) = 0. In 
other words, varying t from 1 to will cause d^{t) to continuously shrink from the Pareto boundary 
of the throughput region 3? to the origin 0. Therefore, for any Pareto inefficient point r* G 3F , there 
exists < t' < 1 such that r* lie on d^(t'), i.e. r* can be achieved with an action profile p* satisfying 



To prove the Pareto boundary are stable CE when K > 2, we need to show that the eigenvalues 
R }n=i °f tne Jacobian matrices J BR and J GP are all inside the unit circle of the complex plane [28], 
i.e. < l,Vn G M. Take the best response dynamics for example. To determine the eigenvalues of 




(35) 



dST{t) = {t| $t' G &(t) such that r' k > r k , Vfc G K and r' k > r k , 3k G /C}. 



(36) 



21 



J BR , we have 



det(£/ - i BR ) 



i 



pi 



2 2(l-p 2 ) 
<: 1 



2(1-Pi) 



2(1-Pi) 2(l-p 2 ) • • 



Pi 

2(1-Pk) 

P2 

2(1-Pic) 



pi 



P2 



2(1-Pi) Pi 



2 2(l-p 2 ) 



2 2(l-p 2 ) 



Pi 

2(1-Pjt) 





2(1-Pi) 



(£ ~ I ~ 2(l- Pl )) ' t 1 + SfeLl I^T 1 



2(1-Pfc) 



PI, 



2(1-Pfe) ' 



2(1-Pi) 



2(1-Pk) 



P2 

2(l-p 2 ) 



Pk- 

2(1-Pi) Pi 



Therefore, we can see that, the eigenvalues of J u are the roots of 

K Pk K 

k=l s 



1 



Pfc 



2(l-p fc ) fc=i 



* 1 

■n(f-5 



2 2(1 - Pfc ) 



)=0. 








S 2 2(l-pjf) 



(37) 



Denote /(£) = £ 



A' 



k=l ?ZI 

> 9 



2(1-Pfc) 



2 2(l-p fc ) 



-. First, we assume that 7^ pj,Vi,j. Without loss of generality, 



consider p\ < P2 < ■ ■ ■ < pk- In this case, the eigenvalues of J BR are the roots of /(£) = —1. Note that 
/(£) is a continuous function and it strictly decreases in (— 00, \ + 2(i- Pl ) )> (3 + 2(i-pi) ' 5 + 2(i-p 2 ) )' 
■ ■ ■ • (3 + K^try 3 + 5fe). and (ife +<»). We also have lim^ (|+ p^ } _ /(£) = -00, 



lim 



2 2(l-p fc ) 



= +00, n = 1,2, ••■ and lim^_ 00 /(^) = lim^ +00 /(^) = 0. Therefore, 



Pk-i 



+ 



the roots of /(£) = -1 lie in (-00, \ + ^y), {\ + 5 + afe). " " " > (3 + 2(1-^) ' 3 

2(r=p7o ) res P ectivel y- 

For the operating points on the Pareto boundary, we have Ylk=\Pk = 1- It i s eas Y to verify that 
/(0) = -1, i.e. £1 = 0. Therefore, 



p(0 = max|&| =£k G G + 



PA-1 



1 



+ 



PA 



* *" ^2 2(l- PA: _i)'2 2(l-p A -) ; - 

To see £x < 1 for < p\ < P2 < ■ ■ ■ < Pk and J2k=iPk = 1- We differentiate two cases: 
1) If PK < 0.5, we have £ K < \ + jtf^ < 1; 



(38) 
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2) If px > 0.5, we have | + 2 (i~ P k i) < ^ an< ^ ^ 2(i~ J p K ) > ^ Since /(£) strictly decreases in 
(l + 2(i-p;'-,) ' 3 + J^fej)' we have ^(J SR ) < 1 if and °"'y if /(l) < -I- In fact, 



/(') - M> = E + ' = E - - _ V, p ) < o. (39) 

fc=l ' - K k=l F z Z^m=l^ m 

The inequality holds because 1 _1 < 1 K _ i — for k = 1, 2, . . . , K — 1 when p# > 0.5, < pi < 

P2 < • ■ ■ < PK, and ^f=iPfc = !■ 

Second, we consider the cases in which there exists pi = pj for certain Suppose that {pk\k=i 
take M discrete values ,km and the number of {p/c}|Li that equal to K m is n m . In this case, 

Equation (1371 ) is reduced to 

K Pk M 

[i + E f ]-IKf-|-Mrri)--»- < 40 > 

fc=l * ~ 2 ~ 2(l-p„) m=l 1 

Hence, equation /(£) = —1 has K + M — X^fcii n m roots in total, and £, = \ + 2{i-k ) ^ s a root °^ 
multiplicity n m — 1, Vm. All these roots are the eigenvalues of matrix J BR . Similarly, the remaining roots 
of /(£) = -1 lie in (-00, i + gp^y), + + + + 

If if > 2, Y.k=iPk = 1 is still sufficient to guarantee that /(l) < -1. Therefore, \^ R \ < l,V/c G /C. ■ 

Fig-IHcompares the throughput performance among various game-theoretic solution concepts, including 
Nash equilibria, Pareto frontier, locally stable conjectural equilibria, and globally convergent conjectural 
equilibria, in random access games. As proven in Theorem |7J Fig. [4] shows that, the entire space spanning 
between the Nash equilibria and Pareto frontier essentially consists of stable conjectural equilibria. In 
addition, as discussed in Remark |3j the set of globally convergent CE is a subset of the stable CE set. 

In practice, it is more important to construct algorithmic mechanisms to attain the desirable CE that 
operate stably and closely to the Pareto boundary. To this end, we develop an iterative algorithm and 
summarize it as Algorithm 3. Specifically, this algorithm has an inner loop and an outer loop. The inner 
loop adopts either Algorithm 1 or 2 to achieve convergence for fixed {ak}f = i- This algorithm initializes 
afc > \1C\ such that it initially globally converges. After converging to a stable CE, the outer loop adaptively 
adjusts {flfclfcLi un til desired efficiency is attained. The outer loop updates {ak}k = \ in the multiplicative 
manner due to two reasons. First, reducing {a>k}^ =1 individually increases {p k }^ =1 and Ylk=lPk and 
hence, moves the operating point towards the Pareto boundary. Second, multiplying {a k } k=1 by the same 
discount factor can maintain weighted fairness among different nodes. Both reasons will be analytically 
explained in the Section IV. It is also worth mentioning that individual nodes can measure the Pareto 
efficiency in a fully distributed manner during the outer loop iteration. For example, individual nodes can 
estimate the other nodes' transmission probabilities {p\} k=1 based on its local observation and figure out 
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whether the current operating point is close to the Pareto boundary by calculating YlkeKPk When 
the network size grows bigger, individually estimating different nodes' transmission probabilities becomes 
challenging. An alternative solution is that individual nodes can instead monitor their common observation 
of the aggregate throughput J2keK u \ anc ^ terminate the update of {ak}^ =1 once the aggregate throughput 
starts to decrease. Next, we discuss several implementation issues regarding Algorithm 3. First, it is not 
necessary that all the nodes update their parameters {a>k}f =1 synchronously. However, these nodes need 
to maintain the same update frequency, e.g. each node will update its parameter after a certain number of 
timeslots or seconds. As long as 5 is small, the performance gap between the actual CE and the intended 
one will not be large. Moreover, in order to guarantee fairness, the new incoming nodes need to know 
the real-time parameters of the old nodes in the same traffic class. This initialization only needs to be 
done once, when the new nodes enter the cell by tracking the evolution of the transmission probabilities 
of the nodes in the same traffic class. 

Algorithm 3 : Adaptive Distributed Learning Algorithm for Random Access 
1: Initialize: stepsize 7 and 5, the transmission probability p° € [0, 1], and the parameter a/, > \K\ in 

node fc's belief function, Vfc € K. 

2: procedure 

3: outer loop: For each node k, a& <— afc(l — 5). 

4: inner loop: Locally at each node k, use Algorithm 1 or 2 to update p\. 

5: until it converges. 

6: until the aggregate throughput is maximized or YlkeicPk ~ 1- 

7: end procedure 



IV. Extensions to Heterogeneous Networks and Ad-hoc Networks 

In this section, we first investigate how users with different qualify-of-service requirements should 
initialize their belief functions and interact in the heterogeneous network setting and show that the 
conjecture-based approaches approximately achieve the weighted fairness. Furthermore, we discuss how 
the single-cell solution can be extended to the general ad-hoc network scenario, where only the devices 
within a certain neighborhood range will impact each other's throughput. 
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A. Equilibrium Selection for Heterogeneous Networks 

Consider a network with N > 1 different classes of nodes. Let 4> n denote the parameter that class-ra 
nodes choose for their conjectured utility functions (i.e. the parameter if node k belongs to class-n) 
and T n denote the set of nodes that set their algorithm parameters to be <p n , 1 < n < N. At equilibrium, 
the transmission probabilities of the same class of nodes are equal, denoted as p n . Before we proceed, we 
first define the weighted fairness for the random access game [32]. For each traffic class n, we associate 
with a positive weight Xn- Then the weighted fairness intended for the random access game satisfy 

v,, j e {1, 2, - , n}, v, e T t ,W e T h E{1{T - =0}1{r ° =1}} = E{1{T -' =0}1{r - ll} , (41) 

Xi Xj 
which means that the probability of an successful transmission attempt for traffic class n is proportional 
to its weight Xn- By simple manipulation, we have the equivalent form for equation (I4T1 ) [32]: 

WF p WF 

vy6{u -^ , '(T^^' <42) 

Recall that Theorem [TJ showed how to choose {ak\k=i §i ven a desired operating point {p%}^ = i such 
that it is a CE. The following theorem indicates the quantitative relationship between the chosen algorithm 
parameters {cf) n }^ =1 , the sizes of different classes {T n }^ =l , and the resulting steady-state transmission 
probabilities {pn\n=\- More importantly, it also shows that if the network size is large, the conjecture- 
based algorithms approximately achieve weighted fairness. 

Theorem 8: Suppose that 4> n > 2, VI < n < N. The achieved steady-state transmission probabilities 

{Pn}n=l are g iven b y 



where q satisfies 



1 n 

w n 



n=l 



1 + ,/!-^ 



(44) 



Proof: As shown in Theorem [TJ a* k p* k = Y\.ieK\{h}{^ ~Pi)- Denote q = — Pi)- Therefore, we 

obtain 

4>nPn{l -Pn)=Q,Vl<n<N. (45) 



Since cp n > 2, we have p n < 0.5. Such a root of the quadratic equation in (1451 ) is given in (1431 . Note 
that q = riie/c(l — Pi)- Substituting (l43l into this equality, we get (|44l . 

We can verify that a unique q satisfying the equality in (|4"41 exists if 4> n > 2, VI < n < N. This is 
because the RHS of (1441 is feasible for q < mini< n <Ar{(/) n }/4 and it is a strictly decreasing function 
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in g. Meanwhile, the LHS of (l44l is strictly increasing on g G [0, mini< n <jv{(?!> n }/4]. Note that when 

g = mini< n <Ar{0 n }/4, 

LHS of (01 = min — > - > RHS of (04]). (46) 

l<n<N A ~ 2 ~ 

if 4>n > 2. Therefore, a unique g E [0, mini< ra <Ar{(/> n }/4] satisfies (|44l exists. ■ 

Remark 6: There are several intuitions and observations that we can obtain from Theorem [8] First, 
the multiplicative decreasing update in Algorithm 3 aims to move the operating points towards Pareto 
boundary. A quantitative approximation between the steady-state transmission probability p n and the 
algorithm parameter 4> n of each traffic class can be derived if a large number of nodes coexist. Since 
g — » when |.F n | is large, using the Taylor expansion, p n can be approximated as g/<fi n , i.e. the steady- 
state transmission probability p n decays as the inverse first power of parameter 4> n that indicates the 
"aggressiveness" of traffic class n. Finally, we also observe from (l45l ) that, if \T n \ is large, p n — > and 
1 — p n w 1. Therefore, 

G {1,2,- • • ,N},4>iPi{l -Pi) = o^il ,),) => ~ -p^- (47) 

Equation (|47l indicates that Algorithm 1 and 2 approximately achieve weighted fairness given in (1421) 
with weight Xn = l/0n- Moreover, it is worth mentioning that the weighted fairness is purely an implicit 
by-product of the conjecture-based approach and it can be sustained with stability. Therefore, Algorithm 
3 chooses to multiply {ak}^ =1 by the same discount factor 1 — 5 such that the weighted fairness can be 
maintained. 

B. Extension to Ad-hoc Networks 

Consider a wireless ad-hoc network with a set K. = {1, 2, ... , K} of distinct node pairs in Fig. [5] 
Each link (node pair) consists of one dedicated transmitter and one dedicated receiver. We assume that 
the transmission of a link is interfered from the transmission of another link, if the distance between the 
receiver node of the former and the transmitter node of the latter is less than some threshold D t h [9] 
[15]. For any node i, we define 7j C K as the set of nodes whose transmitters cause interference to the 
receiver of node % and Oi C K. as the set of nodes whose receivers get interfered from the transmitter of 
node i. For example, in Fig. [51 I\ = {K} and 0\ = {2, K}. Then, the throughput of node i is 

«*(p) =Pk -Pi). (48) 

ieh 

In this scenario, the state, namely, the contention measure signal, can be redefined according to = 
Yli£i k (.l~Pi)- Applying the conjecture-based approach, we have the following conjectured utility function 
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for node k: 



«fc(4(Pfc)»Pfc) =Pfc[lI( 1 ~ pt i X ) ~ a k(Pk ~ p{ *) 



(49) 



Parallel to the theorems proven in Section III-B and C, we have the following theorems on the stability 
and convergence of conjecture-based bio-inspired learning algorithms in ad-hoc networks. These theorems 
can be shown similarly as in Section III, and hence, the proofs are omitted. 

1) Stability and Convergence: 

Theorem 9: For any p* = (p*, . . . ,p* K ) e P, if 



p* is a stable CE for Algorithm 1 and Algorithm 2 with sufficiently small 7. 

Theorem 10: Regardless of any initial value chosen for {p^}^ = i, if the parameters {ak\k=i in the 
belief functions {sk}£ = i satisfy 



Algorithm 1 and Algorithm 2 with sufficiently small 7 converge to a unique CE. 

Remark 7: We observe that the sufficient conditions in Theorem [9] and [10] are more relaxed compared 
with the theorems in Section III. As opposed to the single-cell case, the mutual interference is reduced 
in ad-hoc networks due to the large scale geographical distance, therefore, these nodes can potentially 
improve their throughput by increasing their transmission probabilities while still maintaining the local 
stability as well as global convergence. 

Remark 8: In ad-hoc networks, the parameters {ak}^ =l can be determined in a distributed fashion 
such that the sufficient conditions in Theorem [10] are satisfied. For example, consider the symmetric 
case where transmitter i interferes with receiver j if and only if transmitter i can receive signals from 
receiver j. Each transmitter can listen to the channel and estimate \Ok \ by intercepting the ACK packets 
sent by the receivers of the nodes in set Ok- An alternative distributed solution is that each transmitter 
broadcasts its parameter a^, and receiver k calculates Ylieh a" anc ^ notrnes tne nodes in set 1^ to adjust 
their parameters accordingly. 

2 ) Stability of the Throughput Region: We also extend the stability analysis of the throughput region 
from the single-cell scenario to the ad-hoc networks. The following lemma explicitly describes the Pareto 
frontier of the throughput region. 



ieO fc u{fc} 




or 




(50) 




(51) 
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Lemma 11: The Pareto boundary of the throughput region 2F can be characterized as the set of points 
r = (n , . . . , tk) optimizing the weighted proportional fairness objective [33]: 

max^WfclogTfc, (52) 

in which = Yiiel ~Pi) f° r au possible sets of positive link "weights" {uJk}^ = \- Specifically, for 
a particular weight combination {u>k}k = i, the optimal p' is given by 

Pk = — — • (53) 

Proof: See [33] for details. 

Based on Lemma [TT] we derive in the following theorem the necessary and sufficient condition under 
which a particular Pareto-efficient operating point is a stable CE for Algorithm 1 . Similar results can be 
derived for Algorithm 2 with sufficiently small 7. 

Theorem 12: Suppose p* = (p*, . . . ,p* K ) € P satisfies (l53l and maximizes the problem in (l52l) . The 
elements of the Jacobi matrix J at p* satisfy 



ik 



2 5 if % — k j 



ifk€l i} (54) 



0, otherwise. 
If p(J) < 1, p is a stable CE for Algorithm 1. 

Remark 9: Theorem [12] generalizes the result in Theorem [7] from the single-cell scenario to the ad-hoc 
networks. Consider the l\ norm for J at p. We have 



max 



+ V ^ . (55) 

In the single-cell case, O k = JC\{k}, VA; £ /C, and ||J||i equals to 1 for any Pareto-optimal operating point. 
Therefore, any Pareto inefficient operating point can be achieved with stability due to p(J) < ||J||i < 1. 
However, in ad-hoc networks, the form of the Jacobi matrix J depends on the actual network topology 
and it is difficult to bound the spectral radius for a generic setting using certain matrix forms, such as l\ 
norm or norm. Alternatively, according to Theorem [12j we will numerically test the stability of the 
Pareto-optimal operating points in the simulation section. 



V. Numerical Simulations 

In this section, we numerically compare the performance of the existing 802.11 DCF protocol, the 
P-MAC protocol [32] and the proposed algorithms in this paper. 
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We first illustrate the evolution of transmission probabilities of Algorithm 1 and 2. We simulate a single- 
cell network of 5 nodes. For each node, the initial transmission probability p\ is uniformly distributed in 
[0, 1] and dfc is uniformly distributed between 5 and 10. The stepsize in the gradient play is 7 = 0.02. Fig. 
[6] compares the trajectory of the transmission probability updates in both Algorithm 1 and 2 in a single 
realization, under the assumption that node k can perfectly estimate the probability IX/eJCUJfc} ~~ Pi)' 
yk e K,. The best response update converges in around 8 iterations and the gradient play experiences a 
more smooth trajectory and the same equilibrium is attained after 35 iterations. In addition, to illustrate 
how individual nodes can adaptively adjust their algorithm parameters and improve their throughput, we 
simulate a scenario with two traffic classes. Each traffic class consists of 5 nodes and the initial algorithm 
parameters of class 1 and 2 are (f>\ = 30 and <p2 = 60, respectively. The discount factor in Algorithm 

3 is 6 = 0.05. The blue dotted curve in Fig. [7] indicates that the operating point moves towards the red 
Pareto boundary until the outer loop detects that the desired efficiency is reached. 

In practice, packet transmission over wireless links, e.g. IEEE 802.11 WLANs, involves extra protocol 
overheads, such as inter-frame space and packet header. Assuming these realistic communication scenar- 
ios, we compare various performance metrics, including throughput, fairness, convergence, and stability, 
between our proposed conjecture-based algorithms, the P-MAC protocol in [32], and the IEEE 802.11 
DCF To evaluate these metrics, the physical layer parameters need to be specified. In the simulation, we 
assume that each wireless device operates at the IEEE 802.11a PHY mode-8, and the key parameters 
are summarized in Table U We assume no transmission errors and the RTS/CTS mechanism is disabled. 
The aggregate network throughput can be calculated using Bianchi's model [35] 

q- PsLd (56) 

(1 — Ptr)T s l t + P S T S + Pt r T c — P S T C 

where P s = J2n=i \^n\ 'Pn'O-- Pn)^"^ 1 ■ Ylm^ni 1 - Pm)^ is the probability that a transmission 
occurring on the channel is successful, P tr = 1 — 11^=1 (1 ~~ Pn)'^" is the probability that at least one 
transmission attempt happens, T s is the average time of a successful transmission, and T c is the average 
duration of a collision. The detailed derivation of T s and T c using the given network parameters in Table 
[Jean be found in [32] [35]. The parameters in P-MAC are set according to [32]. The contention window 
sizes in the IEEE 802.11 DCF are CW m i n = 16 and CW max = 1024. In Algorithm 3, individual nodes 
monitor the aggregate throughput to determine whether to adjust the parameter ajfe. The numerical results 
are obtained using a MAC simulation program in [35]. Our comparison results are summarized as follows. 
First, the throughput of the three algorithms is compared. We vary the total number of nodes K from 

4 to 50, in which \K/2] nodes carry class-1 traffic and the remaining nodes carry class-2 traffic. The 
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positive weights of class-1 and class-2 are xi = 1 and \2 = 0.5. The initial parameters in Algorithm 
3 are chosen to be <j>\ = 3K/xi and cf>2 = 3K/x2- As shown in Fig. [U both the conjecture-based 
algorithm and P-MAC significantly outperform the IEEE 802.11 DCF. The IEEE 802.11 DCF achieves 
the lowest throughput, because the lack of adaptation mechanism of the contention window size causes 
more frequent packet collisions as the number of nodes increases. Surprisingly, the performance of the 
conjectural equilibrium attained by Algorithm 3 achieves the maximum achievable throughput. It also 
outperforms P-MAC, because P-MAC uses approximation to derive closed-form expressions for the 
transmission probabilities of different traffic class. 

Next, we evaluate the short-term fairness of different protocols using the quantitative fairness index 
introduced in [32] 

KTk/Xn) ,kGT n (57) 



K T k/Xn) +d(T k /Xn) 

in which 7^ denote the throughput of node k that belongs to traffic class n, and [i and a are, respectively, 
the mean and the standard deviation of T n /xn over all the active data traffic flows. We simulate a 
transmission duration of 3 minutes. The stage duration in Algorithm 3 is set as 50 successful transmissions. 
As shown in Fig. |9j we can see that Algorithm 3 and P-MAC are comparable in their fairness performance 
and the achieved fairness index is always above 0.95 regardless of the network configuration. On the 
other hand, the fairness performance of 802.11 DCF is much poorer than the previous two algorithms 
because the DCF protocol provides no fairness guarantee. 

Last, in order to compare the convergence and the stability of different protocols for time-varying 
traffic, we simulate a network in which the number of active nodes fluctuates over time. In order to cope 
with traffic fluctuation, we slightly modify the outer loop in Algorithm 3. Once some nodes join or leave 
the network (this can be detected either by tracking the contention signal n^e/cU ~ Pk) or estimating 
the total number of nodes in the network [36]), the adaptation of a& is activated. Specifically, if more 
nodes join the network, <— a^{l + 5), otherwise, a k <— a^(l — S). At the beginning, = \ J : 2\ = 25. 
At stage 200, 15 class-1 and 15 class-2 nodes join the network. These nodes leave the network at the 
400th stage. The algorithm parameter is updated every 5 stages and the stepsize in the gradient 
play is 7 = 0.003. Fig. [10] and Fig. [TT] show the variation of the transmission probabilities for both 
traffic classes and the expected accumulative throughput over time. P-MAC does not converge due to the 
lack of feedback control, which agrees with the observation about the instability of P-MAC reported in 
[13]. In addition, the optimal transmission probabilities computed by P-MAC and the conjecture -based 
algorithms are different under the same network parameters because of the approximation used in P-MAC. 
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As shown in Fig. [TO] nodes deploying P-MAC transmit with a higher probabilities than the conjecture- 
based algorithms, which creates a more congested environment. As a result, the accumulative throughput 
achieved by P-MAC is slightly lower than the optimal throughput. In contrast, the conjecture -based 
algorithms enable the nodes adaptively tune their parameters to maximize the network throughput 
while maintaining the weighted fairness as well as the system stability. As shown in Fig. [10] and Fig. 
ITT1 during stage [200,300] and [400,470], both the best response and the gradient play autonomously 
adapt their parameter until it converges to the optimal operating point. As discussed before, the best 
response learning converges faster than the gradient play learning. To give a quantitative measure of 
the stability, the standard deviations of the expected accumulative throughput in Fig. [TT] for different 
algorithms satisfy ^{Tp^f^ 1 ) / a{T^jQ ected ) « 7 and the actual achieved accumulative throughput 
satisfy (j{T^ I a l c )/a(T^ ual ) « 2. We can see that, thanks to the inherent feedback control mechanism, 
both bio-inspired learning algorithm exhibit superior stability performance than P-MAC. 

We also simulate the evolution trajectory of the transmission probabilities of the proposed Algorithm 1 
and the algorithm in [16]. Both algorithms are essentially the best-response based algorithms. Specifically, 
we consider a network with K = 6. The peak data rates for different nodes are r\ = 6, T2 = 36, r% = 9, 
r4 = 12, r§ = 18, and = 54, all in Mbps. We apply the algorithm in [16] to solve the following 
network utility maximization problem: 



in which a = 2. The optimal solution corresponds to the belief configuration a\ = 2.03, (12 = 3.93, 0,3 = 
2.32, 04 = 2.55, 05 = 2.97, and ciq = 4.74. The trajectory of both algorithms are shown in Fig. [12] We can 
see that, both algorithms converge very fast and oscillate around the neighborhood to the optimal solution 
after several iterations. However, as we discussed before, the algorithm in [16] requires individual nodes 
to decode all the received packet headers and estimate the transmission probabilities of the other nodes 
individually, which introduces a great internal computational overhead when the network size grows large. 
In contrast, nodes deploying Algorithm 1 only have to estimate the probability of having a free channel 
without the need of decoding all the packets, which substantially reduces their computational efforts. 

We simulate the performance of the proposed algorithms in an ad-hoc network contained in a 100m x 
100m square area. Nodes in the square area are placed in the random manner. Two nodes can interfere 
with each other if their distance is no more than 40m, i.e. D t h = 40m. We simulate three scenarios with 
the node numbers K = {10, 20, 40}. The Pareto-efficient point that we select is the associated operating 
point with the link weighted vector ujs = 1, 1 < k < K/2, and 0.5, K/2 < k < K in (l53l . We can see 




(58) 
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from Fig. [13] that, p(J BR ) < 1 holds for all the simulated topologies. As shown in Fig. [T3l in some 
realizations, p{J BR ) = 1, and hence, the associate operating points are not asymptotically stable. This 
will occur when two nodes interfere with each other and they do not interfere and are not interfered 
by the remaining nodes in the entire ad-hoc network. On the other hand, the stability improves as the 
number of nodes increases. As long as the density of nodes is sufficiently large, the stability of the 
conjecture-based algorithm on the Pareto-efficient operating point can be achieved. Fig. [14] and Fig. [15] 
show the evolution of transmission probabilities and accumulative throughput for the IEEE 802.11 DCF 
and Algorithm 1 in a 10-node ad-hoc network with a randomly generated topology. The trajectory of the 
IEEE 802.11 DCF is obtained using the model in [12]. The parameter in Algorithm 1 is chosen to be 
\Ok\. The intuition behind is that, if \Ok\ = 0, node k can transmit at the maximal probability without 
interfering with any node. On the other hand, if \Ok\ is large, node k should backoff adequately such that 
the reciprocity can be established. As shown in the figures, Algorithm 1 converges faster and achieves 
higher throughput than DCF. Similar results have been observed in the other simulated topologies. 

VI. Conclusion 

In this paper, we propose distributed learning solutions that enable autonomous nodes to improve their 
throughput performance in random access networks. It is well-known that whenever biological entities 
behave selfishly and myopically, a tragedy of commons might take place, which has also been observed 
in the context of random access control. Hence, we investigate whether forming internal belief functions 
and learning the impact of various actions can alter the interaction outcome among these intelligent nodes. 
Specifically, two bio-inspired learning mechanisms are proposed to dynamically update individual nodes' 
transmission probabilities. It is analytically proven that the entire throughput region essentially consist 
of stable conjectural equilibria. In addition, we prove that the conjecture-based approach achieves the 
weighted fairness for heterogeneous traffic classes and extend the distributed learning solutions to ad-hoc 
networks. Simulation results have shown that the proposed algorithms achieve significant performance 
improvement against existing protocols, including the IEEE 802.11 DCF and the P-MAC protocol, in 
terms of not only fairness and throughput but also convergence and stability. A potential future direction 
is to investigate how to detect and prevent misbehavior for these bio-inspired solutions. 
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Fig. 1. System model of a single cell. 
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Fig. 2. An illustration of the distributed learning process. 
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Fig. 3. Comparison between the best response learning and the IEEE 802.11 DCF (p max i s specified in the DCF protocol). 




Fig. 4. Comparison among different solution concepts. 



TABLE I 

IEEE 802. 1 1 A PHY MODE-8 PARAMETERS 



Parameters 


Value 


Duration of an Idle Slot (T 3 i t) 


9 us 


Duration of PHY Header (T PH y) 


20 (is 


SIFS Time (Tsifs) 


16 /is 


DIFS Time (T DIFS ) 


34 


Propagation Delay (Td) 


1 /j,s 


MAC Header (Lmac) 


28 octets 


Packet Payload Size (Ld) 


2304 octets 


ACK Frame Size (Lack) 


14 octets 


Data Rate (R t ) 


54 Mbps 




TXr 



Fig. 5. System model of ad hoc networks. 
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Fig. 6. Dynamics of Algorithms 1 and 2. 
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— Trajectory of Algorithm 3 

— Pareto boundary 
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Fig. 7. The trajectory of Algorithm 3. 
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Fig. 8. Comparison of the accumulative throughput in the IEEE 802.11 DCF, P-MAC, and conjecture-based algorithms. Error 
bars correspond to the standard deviation of the mean of the 100 measurements sampled at each point. The error bars in the 
remaining figures are as in this figure. 
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Fig. 9. Comparison of the achieved fairness of the IEEE 802.11 DCF, P-MAC, and Algorithm 3. 
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Fig. 10. The dynamics of the transmission probabilities in P-MAC and Algorithm 3. 
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Fig. 11. The dynamics of the accumulative throughput in P-MAC and Algorithm 3. 
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Algorithm 1 and the algorithm in [16]. 




Fig. 14. Transmission probabilities of Algorithm 1 and the IEEE 802.11 DCF in ad-hoc networks. 
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Fig. 15. Accumulative throughput of Algorithm 1 and the IEEE 802.11 DCF in ad-hoc networks. 



